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Abstract-  Breast  cancer  is  one  of  the  most  common  malignancies 
in  women  and  a  rare  malignancy  in  men.  Women  who  are 
diagnosed  at  an  early  stage  can  survive  this  often  deadly  disease. 
Mammography  provides  the  best  screening  modality  for 
detecting  early  breast  cancer,  even  before  a  lesion  is  palpable. 
Because  of  the  malignant  mass  pathology,  the  shape  of  the 
mammographic  mass  can  be  used  to  discriminate  between 
malignant  and  benign  masses.  In  this  study  the  use  of  shape 
features  to  classify  breast  masses  has  been  investigated  and  a 
classification  scheme  has  been  developed  to  classify  masses  as 
either  benign  or  malignant.  A  mammogram  database  designed 
to  store  the  images  of  the  masses,  calculated  shape  descriptor 
parameters  and  some  additional  data,  such  as  patient  history, 
category  of  the  mass  and  biopsy  report  if  performed  which  are 
required  in  BI-RADS  is  also  introduced  A  touch  on  memory 
system  has  been  used  as  a  tool  that  permits  access  to  the 
electronic  patient  record  in  the  mammogram  database.  The 
software  is  written  in  Delphi  and  runs  on  Windows  operation 
systems. 
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I.  INTRODUCTION 

Breast  cancer  is  one  of  the  most  common  malignancies  in 
women  and  a  rare  malignancy  in  men.  The  earlier  that  a 
breast  malignancy  is  detected  the  better  the  chance  for  a  cure. 
It  has  been  widely  reported  that  breast  cancer  has  become  the 
second  leading  cause  of  cancer  death  among  women.  Over  a 
lifetime,  one  in  nine  women  risk  contracting  breast  cancer. 
However,  the  good  news  is  that  women  who  are  diagnosed  at 
an  early  stage  can  survive  this  often  deadly  disease. 
Mammography  provides  the  best  screening  modality  for 
detecting  early  breast  cancer,  even  before  a  lesion  is  palpable. 

Approximately  80  to  85%  of  localized  breast  cancers  are 
diagnosed  by  the  mammographic  appearances  of  the  tumor 
[1].  Because  of  the  malignant  mass  pathology,  the  shape  of 
the  mass  can  be  used  to  discriminate  between  malignant  and 
benign  masses  [2].  The  application  of  shape  analysis  has  been 
applied  to  computerized  mammographic  methods.  Magnin  et 
al.  and  Davies  and  Dance,  and  Shen  et  al.  utilized  shape 
descriptors  which  included  various  measures  of  area, 
compactness,  eccentricity,  and  convexity  [3].  However  their 
methods  were  not  applied  to  the  classification  of  masses.  The 
use  of  shape  features  to  classify  breast  masses  has  also  been 
investigated. 

In  this  study  geometric  parameters  such  as  area,  perimeter, 
circularity,  normalized  circularity,  radial  distance  mean  and 


standard  deviation,  area  ratio,  orientation,  eccentricity, 
moment  invariants  and  Fourier  descriptors  up  to  10,  are 
measured.  The  process  starts  with  a  segmentation  phase,  in 
which  an  expert  radiologist  segments  the  mammographic 
mass  shapes  within  the  mammographic  database  set.  These 
pre-segmented  mammographic  mass  shapes  are  then 
processed  by  a  mass  boundary  detection  algorithm  to  obtain 
the  descriptive  geometric  parameters.  A  carefully  designed 
classification  scheme  is  used  in  the  final  step  to  classify 
masses  as  benign  or  malign. 

The  results  show  that  normalized  circulatory  area  and  the 
Fourier  coefficient  A0  can  be  used  successfully  for  feature 
extraction.  The  software  developed  utilizes  this  finding  in  the 
automatic  classification  of  the  masses. 

The  next  part  of  this  study  includes  a  mammogram  database 
design  to  store  the  images  of  the  masses,  calculated  shape 
descriptor  parameters  and  some  additional  data,  such  as 
patient  morphologic  data,  patient  medical  history,  category  of 
the  mass  and  biopsy  report  if  performed  which  are  required  in 
ACR’s  Breast  Imaging  Reporting  and  Data  System.  The 
developed  database  is  designed  to  be  an  Open  Database 
Connectivity  compliant  relational  database  to  support  some 
future  uses,  such  as  screening  the  growth  of  suspicious 
masses,  telemedical  service  support  for  sharing  mass 
information  and  for  facilitating  statistical  data  analysis.  The 
electronic  record  privacy  of  the  patient  is  guarantied  by 
employing  a  Touch  Memory  system.  The  block  diagram  of 
the  developed  system  is  shown  in  Fig.  1. 
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Fig.  1 .  Block  diagram  of  the  developed  system 
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II.  Mass  Shape  Feature  Extraction 

Several  qualitative  and  quantitative  techniques  have  been 
developed  for  characterizing  the  shape  of  masses  in  an  image. 
These  techniques  are  useful  for  classifying  masses  in  a 
pattern  recognition  system  and  for  symbolically  describing 
masses  in  an  image  understanding  system  [4]. 

Because  of  the  malignant  mass  pathology,  the  shape  of  the 
mass  can  be  used  to  discriminate  between  malignant  and 
benign  masses.  In  this  study  geometric  parameters  such  as 
area,  perimeter,  circularity,  normalized  circularity,  radial 
distance  mean  and  standard  deviation,  area  ratio,  orientation, 
eccentricity,  moment  invariants  and  Fourier  descriptors  up  to 
10,  are  measured. 

In  the  first  step,  the  mammographic  mass  shapes,  in  the 
training  data  set  consisting  of  a  total  of  60  cases  including  30 
benign  and  30  malignant  cases,  are  segmented  by  an  expert 
radiologist.  Geometric  parameters  of  the  pre-segmented 
mammographic  mass  shapes  within  this  training  set  are  then 
automatically  computed  using  a  software  specially  developed 
for  this  purpose.  A  careful  examination  of  the  database  and 
experimentation  with  various  classification  techniques  show 
that,  normalized  circulatory  area  and  Fourier  coefficients  can 
be  used  successfully  to  classify  masses  as  benign  or 
malignant.  This  has  been  described  in  the  second  section. 

A.  Area 

The  most  trivial  shape  parameter  is  the  area  of  a  mass.  In 
a  digital  binary  image,  the  area  is  given  by  the  number  of 
pixels  that  belong  to  the  mass.  In  the  matrix  or  pixel  list 
representation  of  the  mass,  area  computing  simply  means 
counting  the  number  of  pixels.  On  the  other  hand,  the  chain 
code  of  the  mass  can  be  used  to  calculate  the  area. 

B.  Perimeter  and  Normalized  Circularity 

The  perimeter  is  another  geometrical  parameter,  which 
can  easily  be  obtained  from  the  chain  code  of  the  mass 
boundary.  It  is  only  needed  to  count  the  length  of  the  chain 
code  and  take  into  consideration  that  steps  in  diagonal 
directions  are  by  a  factor  of  fl  longer.  The  perimeter  p  is 
then  given  by  an  8 -neighborhood  chain  code,  given  in  (1). 

p  =  ne+4ln0  (1) 

where  ne  and  nQ  are  the  number  of  even  and  odd  chain  code 
steps,  respectively  [5]. 

C.  Normalized  Circularity 

Area  and  perimeter  are  two  parameters  which  describe  the 
size  of  a  mass  in  one  or  the  other  way.  In  order  to  compare 
masses  which  are  observed  from  different  distances,  it  is 
important  to  use  shape  parameters  which  do  not  depend  on 


the  size  of  the  mass  on  the  image  plane.  The  normalized 
circularity  cN  is  one  of  the  simplest  parameters  of  this  kind.  It 
is  defined  in  (2). 


where  A  is  the  area  and  p  is  the  perimeter  of  the  mass.  The 
normalized  circularity  equals  zero  for  circles  and  tends  to 
unity  for  complex  shapes  [6]. 

C.  Fourier  descriptors  and  Fourier  coefficient  A0 

Fourier  coefficient  A0  can  be  computed  in  a  way  similar  to 
radial  distance.  The  radial  distance  is  measured  by  first 
detecting  the  centroid  of  the  mass.  The  Euclidean  distance 
from  the  centroid  to  the  edge  is  then  measured  for  the  entire 
boundary.  An  arbitrary  starting  point  is  chosen  and  the 
boundary  is  followed  clockwise.  The  radial  distance  is 
computed  using  equation  (3): 

d  (/)  =  <J(x(i)-xc)2+(y(i)-ycf  (3) 

where  (xc,  yc)  are  the  coordinates  of  the  centroid,  (x(z),  y(i)) 
are  the  coordinates  of  the  boundary  pixel  at  the  z-th  location. 
As  in  the  radial  distance  computation,  Euclidean  distance  is 
measured  but  the  starting  point  is  chosen  due  to  the 
orientation  of  the  shape  and  the  boundary  is  followed 
clockwise  by  a  pre  selected  degree  increment  to  compute  A0 
as  shown  in  Fig.  2.  In  this  study,  1  degree  increment  is  used 
and  thus  N  =  360. 
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Fig.  2.  Computation  of  A0  for  a  malignant  mass. 

III.  Classification  Scheme 

Pattern  recognition  applications  come  in  many  forms.  In 
some  instances,  there  is  an  underlying  and  quantifiable 
statistical  basis  for  the  generation  of  patterns.  In  other 
instances,  the  underlying  structure  of  the  pattern  provides  the 
information  fundamental  for  pattern  recognition.  In  still 
others,  neither  of  the  above  cases  hold  true,  but  designers  are 


able  to  develop  and  ’train’  an  architecture  to  correctly 
associate  input  patterns  with  desired  responses  [7]. 

Statistical  pattern  recognition  assumes  a  statistical  basis 
for  classification  of  algorithms.  A  set  of  characteristic 
measurements,  denoted  features,  are  extracted  from  the  input 
data  and  are  used  to  assign  each  feature  vector  to  one  of  c 
classes.  Features  are  assumed  generated  by  a  state  of  nature, 
and  therefore  the  underlying  model  is  of  a  state  of  nature  or 
class-conditioned  set  of  probabilities  and/or  probability 
density  functions.  Bayes  theorem  is  such  kind. 

The  Bayes  decision  criterion  employs  a  systematic 
procedure  of  assigning  a  cost  to  each  correct  and  incorrect 
decision  and  then  minimizing  the  average  cost.  In  this  study 
the  costs  are  selected  specifically  to  hit  more  malignant  cases. 
A  decision  tree,  consisting  of  the  Bayesian  classifiers  as 
shown  in  Fig.  3,  is  implemented  to  classify  both  of  the  two 
cases  as  errorless  as  possible.  During  the  selection  of 
parameters  in  Bayesian  classifiers  ROC  analysis  is  used. 

ROC  analysis  estimates  a  curve,  which  describes  the 
inherent  tradeoff  between  sensitivity  and  specificity  of  a 
diagnostic  test  [8].  Each  point  on  the  ROC  curve  is  associated 
with  a  specific  diagnostic  criterion.  This  point  will  vary 
among  observers  because  their  diagnostic  criteria  will  vary 
even  when  their  ROC  curves  are  the  same.  The  area  under  the 
curve  gives  the  accuracy  of  the  test  [9].  The  parameters  are 
selected  in  according  to  their  areas  under  the  ROC  curves. 


Fig.  3.  The  decision  tree  for  classification 

The  developed  computer  software  has  been  redesigned  to 
implement  the  decision  tree.  A  total  number  of  25  biopsy- 
performed  mammographic  mass  shapes  have  been  analyzed 
by  the  software  and  decisions  have  been  made  listed  in  Table 
1. 

Table  i 


DECISIONS  OF  THE  SOFTWARE 


Biopsy 

Malignant 

Result 

Benign 

.  .  Malignant 

9 

1 

.decision  _ 

Benign 

1 

14 

IV.  Mammogram  Database 

Database  systems  store  information  in  every  conceivable 
health  care  environment.  From  large  tracking  databases  such 
as  hospital  information  systems  to  a  patient’s  examination 
report,  database  systems  store  and  distribute  the  data  that  are 
depended  on.  Today’s  generation  of  powerful,  inexpensive 
workstation  computers  enables  programmers  to  design 
software  that  maintains  and  distributes  data  quickly  and 
inexpensively. 

A.  Relational  Database  Management  System 

Codd’s  idea  for  a  Relational  Database  Management 
system,  shortened  as  RDBMS,  uses  the  mathematical 
concepts  of  relational  algebra  to  break  down  data  into  sets 
and  related  common  subsets.  Because  information  can 
naturally  be  grouped  into  distinct  sets,  Dr.  Codd  organized 
the  database  system  around  this  concept.  Under  the  relational 
model,  data  is  separated  into  sets  that  resemble  a  table 
structure.  This  table  structure  consists  of  individual  data 
elements  called  columns  or  fields.  A  single  set  of  a  group  of 
fields  is  known  as  a  record  or  row  [10].  Microsoft  Access  is  a 
PC-only  database  product  that  contains  many  of  the  features 
of  a  relational  database  management  system  and  this  is  the 
one  reason  why  it  has  been  chosen  for  the  mammogram 
database. 

B.  Open  Database  Connectivity 

The  unique  feature  of  Open  Database  Connectivity, 
shortened  as  ODBC,  is  that  none  of  its  functions  are  database- 
vendor  specific.  ODBC  allows  to  access  databases  from 
remote  clients  over  Intranet  and  the  most  popular  protocol 
used  is  TCP/IP  [10].  The  second  reason  for  choosing 
Microsoft  Access  is  the  ODBC  compatibility  which  enables 
internet  access  for  telemedical  use. 

C.  Structured  Query  Language 

Structured  query  language,  named  as  SQL,  evolved  to 
service  the  concepts  of  the  relational  database  model.  SQL  is 
a  nonprocedural  language,  in  contrast  to  the  procedural  or 
third-generation  languages  such  as  COBOL  and  C. 
Nonprocedural  means  what  rather  than  how.  SQL  describes 
what  data  to  retrieve,  delete,  or  insert,  rather  than  how  to 
perform  the  operation  and  this  property  allows  SQL  to  find  an 
application  area  in  the  developed  database. 

D.  Tables  in  The  Mammogram  Database 

The  most  important  decision  for  a  database  designer,  after 
the  hardware  platform  and  the  RDBMS  have  been  chosen,  is 
the  structure  of  the  tables.  Decisions  made  at  this  stage  of  the 
design  can  affect  performance  and  programming  later  during 
the  development  process. 


The  developed  mammogram  database  consists  of  mainly 
5,  total  9  numbers  of  tables.  The  tables  named  as  patientfile, 
reports ,  findings ,  biopsies  and  doctors  are  mainly  used  to 
store  electronic  patient  record.  In  addition  to  this,  the  tables 
labeled  as  assessments ,  recommendations ,  reasons  and 
biopsytech  are  designed  for  the  coding  system  of  the 
software.  The  table’s  relationships  are  as  shown  in  Fig.  4. 

1)  Patient  File  Table:  This  table  stores  the  morphologic 
data  of  the  patient  including  patient’s  ID,  photo  and  patient 
medical  history  related  to  breast  cancer  such  as  menopause 
and  menarche  ages,  births  before  and  after  age  30,  the  prior 
breast  cancers,  ovarian  and  endothelial  cancers  are  seen  or 
not,  patient’ s  mother  and  sister  have  the  symptoms  of  breast 
cancer  or  not. 

2)  Reports  Table:  The  examination  date,  the  name  of  the 
doctor  who  examined,  breast  composition  and  the  assessment, 
recommendation,  examination  reason  codes  are  stored  in  this 
table.  While  the  relationship  with  the  patient  file  table  is 
obtained  via  patient  ID,  another  relationship  is  enabled  via 
mass  findings  ID. 

3)  Findings  Table:  Mammo graphic  mass  finding  data, 
including  the  image  and  the  location  of  the  mass;  normalized 
circularity,  area  and  the  Fourier  coefficient  A0  parameters  of 
the  mass,  software  decision  and  the  ID  of  the  biopsy,  if 
performed,  are  the  fields  of  the  findings  table. 

4)  Biopsies  Table:  The  biopsy  data,  consisting  of  the 
used  technique,  classification  and  the  pathological  size  of  the 
mass,  are  stored  in  this  table. 


F.  Patient  Privacy 

Touch  Memory  (TOM)  buttons  are  electronic  memory 
chips  contained  in  small,  water-resistant,  stainless  steel 
canisters.  All  Touch  Memory  buttons  contain  a  unique  ID 
number  that  is  unalterable  and  identifies  each  button  [11]. 
The  unique  ID  number  in  TOM  is  used  as  a  tool  that  permits 
access  to  an  electronic  patient  record  in  the  mammogram 
database  via  the  developed  computer  software.  DS9097  COM 
port  adapter  is  employed  as  a  TOM  reader  and  connected  to 
the  computer  via  an  RS232  serial  port. 

V.  Discussion 

Mammography  provides  the  best  screening  modality  for 
detecting  early  breast  cancer,  even  before  a  lesion  is  palpable. 
Because  of  the  malignant  mass  pathology,  the  shape  of  the 
mass  can  be  used  to  discriminate  between  malignant  and 
benign  masses. 

This  study  defines  a  classification  scheme  based  on  the 
geometrical  parameters  of  mammographic  mass  shapes  and  a 
BI-RADS  compatible  mammogram  database  structure.  With 
respect  to  these  definitions,  a  32-bit  computer  software  is 
developed  for  clinical  use.  The  software  makes  decisions  in  a 
few  seconds.  In  addition  to  this,  it  has  a  user  friendly 
graphical  interface  and  runs  on  windows  operation  systems. 
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E.  Coding  System 


The  coding  system  is  defined  with  the  assessments, 
recommendations,  reasons  and  biopsytech  tables.  During  the 
design  of  these  tables,  much  attention  is  focused  on  the 
American  College  of  Radiology’s  Breast  Imaging  Reporting 
and  Data  System  shortened  as  BI-RADS  since  it  contains  a 
guide  to  standard  coding. 
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